Unsupervised Word Segmentation in Context
نویسندگان
چکیده
This paper extends existing word segmentation models to take non-linguistic context into account. It improves the token F-score of a top performing segmentation models by 2.5% on a 27k utterances dataset. We posit that word segmentation is easier in-context because the learner is not trying to access irrelevant lexical items. We use topics from a Latent Dirichlet Allocation model as a proxy for “activities” contexts, to label the Providence corpus. We present Adaptor Grammar models that use these context labels, and we study their performance with and without context annotations at test time.
منابع مشابه
Unsupervised Texture Image Segmentation Using MRFEM Framework
Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...
متن کاملJoint Word Segmentation and Phonetic Category Induction
We describe a model which jointly performs word segmentation and induces vowel categories from formant values. Vowel induction performance improves slightly over a baseline model which does not segment; segmentation performance decreases slightly from a baseline using entirely symbolic input. Our high joint performance in this idealized setting implies that problems in unsupervised speech recog...
متن کاملUnsupervised Texture Image Segmentation Using MRFEM Framework
Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...
متن کاملExploring unsupervised word segmentation for machine translation in the South African context
We explore the application of unsupervised word segmentation algorithms to phrase-based statistical machine translation (SMT) systems, translating from English to four South African languages: Afrikaans, Northern Sotho, Tsonga and Zulu. Positive results in terms of the standard BLEU and NIST scores are obtained for systems translating into Afrikaans and Zulu.
متن کاملProtein secondary structure detection based on unsupervised word segmentation
Unsupervised word segmentation methods were applied to analyze protein sequences. Protein sequences, such as “MTMDKSELVQKA...,” were used as input to these methods. Segmented “protein word” sequences, such as “MTM DKSE LVQKA,” were then obtained. We compared the “protein words” derived via unsupervised segmentation and protein secondary structure segmentation. An interesting finding is that uns...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014